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ABSTRACT 


This  report  describes  the  statistical  design  of  a sample  survey  to 
monitor  the  condition  of  microfilm  in  a large  collection  maintained 
by  the  National  Archives.  The  design  criterion  developed  for  the 
survey  ensures  that  the  number  of  rolls  of  film  inspected  will  be 
large  enough  to  achieve  a pre-chosen  probability  of  detecting  a 
specified  amount  of  damaged  film  that  might  exist  in  the  population. 
Tables  and  formulas  are  given  to  satisfy  the  design  criterion  under 
a range  of  conditions.  Other  practical  aspects  of  survey  design  are 
discussed  including  the  sampling  frame,  stratification,  sample 
selection  procedure,  pilot  testing,  and  the  use  of  replicated 
sampling . 


Key  Words  and  Phrases:  Sample  Size  Determination,  Sampling 

Fraction,  Sampling  Frame,  Sample  Survey,  Statistics,  Stratification. 


Note:  This  document  was  submitted  to  the  National  Archives  and  Records 

Administration  in  1987  as  a letter  report.  This  1988  printing  as  an 
Interagency  Report  of  the  National  Institute  of  Standards  and  Technology 
responds  to  requests  for  wider  circulation  of  this  material. 
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1.  Introduction:  Goals  of  the  Survey 


Regular  surveys  of  microfilm  collections  are  necessary  in  order  to 
monitor  the  condition  of  the  film  and  to  detect  any  serious  problems  of 
degradation  or  other  damage  to  the  records.  This  report  deals  specifically 
with  the  design  of  a survey  for  the  large  microfilm  collection  maintained  by 
the  National  Archives  and  Records  Administration  (NARA)  at  National 
Underground  Storage,  Inc.  (NUS)  in  Boyers,  Pennsylvania 

The  practical  survey  procedure  envisaged  consists  of  two  separate  phases. 
The  first  phase  has  the  purpose  of  detecting  and  identifying  any  parts  of  the 
microfilm  collection  where  serious  degradation  has  occurred.  After  this 
initial  broad  inspection  is  carried  out,  a follow-up  investigation  will  be 
conducted  as  the  second  phase  of  the  survey.  The  purpose  of  the  second  phase 
follow-up  study  is  to  evaluate  the  extent  of  any  problems  discovered  in  the 
first  phase  and  to  recommend  or  initiate  appropriate  corrective  action.  This 
report  addresses  the  statistical  aspects  of  designing  the  first  phase  survey 
of  the  microfilm. 

As  thus  described,  the  purpose  of  this  survey  is  somewhat  different  in 
character  from  many  typical  sample  surveys.  It  is  more  common  for  a survey  to 
be  conducted  for  the  purpose  of  estimating  the  percentage  or  number  of  items 
in  a population  that  belong  to  a given  category  (e.g.  damaged)  than  for  the 
purpose  of  detecting  or  locating  the  presence  of  damaged  items. 

This  report  describes  a method  of  statistically  assessing  the  merits  of  a 
given  sampling  design  in  terms  of  the  probability  of  detecting  a group  of 
damaged  microfilm  rolls  in  a population  of  rolls.  It  is  found  that  a rule 
based  on  selecting  a fixed  proportion  of  the  population,  no  matter  how  large 
or  small  the  population,  is  a statistically  defensible  approach.  Some  of  the 
statistical  properties  of  this  approach  are  characterized. 
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2.  Characteristics  of  the  NUS  Population.  Sample  Units  and  Frame 


The  storage  area  at  NUS  consists  of  an  underground  vault  in  a tunnel  of  a 
limestone  mine.  The  storage  area  covers  about  5000  square  feet  and  contains 
about  40  rows  of  shelving,  with  six  to  ten  compartments  per  row. 

A summary  description  of  the  shelving  and  its  contents  is  given  in 
Appendix  1 . The  summary  is  based  on  notes  taken  by  Alan  Calmes  when  he 
visited  the  site  in  August,  1985.  A sketched  map  of  the  facility  is  included 
in  the  Appendix. 

Based  on  the  summary  in  Appendix  1,  there  are  about  124,000  rolls  of 
microfilm  at  NUS,  of  which  about  50,000  are  16  mm  format  and  74,000  are  35  mm 
format . 

The  standard  inspection  procedure  for  microfilm,  as  described  in  McCamy 
(1964)  in  the  context  of  inspection  for  aging  blemishes,  can  be  used  for 
evaluating  the  condition  of  individual  rolls  of  microfilm.  Since  that 
procedure  is  designed  for  evaluating  the  film  on  a 100  foot  roll  basis,  the 
appropriate  sampling  unit  for  this  survey  is  the  100  foot  roll.  This  means 
that  a workable  procedure  will  be  needed  for  inspecting  a randomly  chosen  100' 
roll  that  is  spliced  together  with  nine  other  100'  rolls  on  a 1000'  core. 

A critical  element  in  the  successful  execution  of  a survey  of  any 
population  is  the  existence  of  an  adequate  frame.  A frame  is  a literal  or 
conceptual  list  that  contains  exactly  one  entry  for  each  and  every  member  of 
the  population  to  be  sampled.  In  the  present  context,  an  adequate  frame  would 
have  to  be  capable  of  uniquely  identifying  every  100'  roll  of  film  in  the  NUS 
storage  facility.  A number  of  finding  aids  and  other  forms  of  intellectual 
control  related  to  the  contents  of  the  microfilms  exist.  Unfortunately,  none 
of  these  forms  a complete  listing  of  the  population  of  microfilm. 

A computer  listing  of  the  microfilm  by  "Control  Number"  exists.  - This 
listing  appears  to  be  a useful  starting  point,  but  is  not  fully  adequate  as  a 
sampling  frame  at  present.  One  important  deficiency  of  the  Control  Number 
list  is  that  there  is  apparently  no  simple  and  accurate  way  to  figure  out  how 
many  100'  rolls  of  film  are  represented  by  each  entry  on  the  list.  In 
addition,  it  is  not  possible  to  manipulate  the  list  (e.g.  sort  it,  create 
statistical  summaries  of  the  data  it  contains)  owing  to  the  inflexibility  of 
the  computer  hardware  and/or  software  environment  in  which  it  resides. 

An  improved  version  of  the  computerized  Control  Number  list  would  be  a 
useful  tool  to  aid  in  conducting  periodic  inspection  surveys  of  the  microfilm 
population.  A well-designed  and  maintained  computer  index  to  the  population 
could  also  be  useful  in  other  ways,  including  recording  and  monitoring  the 
results  of  the  periodic  inspections  themselves.  An  adequate  computerized  list 
should  contain  at  least  the  following  information: 
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• Identification  number  — ideally  at  the  individual  roll  level 

• Number  of  rolls  represented  by  each  entry,  if  more  than  1 

• Source  of  microfilm  (NARA  or  Other) 

• Film  size  (16  mm  or  35  mm) 

• Film  type  (camera  negative,  duplicate  negative,  duplicate 
positive , etc . ) 

• Storage  location 

• Year  produced  (at  least  grouped  into  the  major  categories 

defined  by:  before  1955,  1956-1965,  and  after  1965) 

For  the  purposes  of  conducting  the  inspection  survey,  it  is  not  necessary  to 
include  information  on  the  intellectual  content  of  the  film  rolls,  but  such 
information  is  essential  for  other  purposes. 

A very  important  use  for  the  information  contained  in  the  frame  will  be 
in  aiding  the  (second  phase)  follow-up  investigations  that  will  track  down  any 
significant  problems  or  deterioration  detected  in  the  (first  phase)  survey. 

The  identification  of  a problem  in  the  survey  sample  would  lead  to  follow-up 
checks  of  other  rolls  of  film  that  belong  to  the  same  Control  Number,  or  are 
housed  in  the  same  box,  or  were  produced  at  the  same  time  by  the  same 
supplier,  or  have  some  other  salient  characteristic  in  common  with  the  sample 
roll(s) . For  convenience  in  what  follows,  microfilm  roils  that  can  be 
logically  linked  together  through  sharing  some  such  characteristic  will  be 
referred  to  as  a "logical  group"  (or  simply  a "group"  when  the  context  is 
clear.)  To  enable  efficient  use  of  the  logical  group  structure  in  the  survey, 
it  will  be  important  that  the  computerized  list,  or  sampling  frame,  contain 
the  necessary  information  for  locating  all  rolls  in  the  population  that  belong 
to  the  same  logical  group  as  any  sampled  roll. 
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3.  Selection  of  Sample  Size:  Probability  of  Detecting  a Problem 


An  important  consideration  in  the  planning  of  a statistical  survey  is  the 
choice  of  the  sample  size,  which  in  the  present  case  amounts  to  the  number  of 
rolls  of  microfilm  from  the  population  to  be  inspected.  Clearly,  as  the 
sample  size  increases,  the  precision  of  the  inference  from  the  sample  data  to 
the  population  increases  (as  long  as  the  quality  of  the  inspection  procedure 
does  not  degrade  with  increasing  sample  size.)  Therefore,  the  ideal  of 
inspecting  the  whole  population  must  be  balanced  against  the  limited  resources 
available  for  conducting  the  inspection. 

The  data  from  a sample  survey  are  typically  put  to  many  uses,  all  of 
which  are  affected  by  the  sample  size.  In  order  to  treat  the  choice  of  sample 
size  mathematically,  it  is  necessary  to  adopt  a model  for  the  population  and 
sampling  procedure  and  to  study  the  intended  primary  use  of  the  survey  data  in 
terms  of  that  model. 

Suppose  that  there  exist  in  the  population  some  number  "A"  of  rolls  of 
microfilm  that  are  significantly  deteriorated,  or  are  at  substantial  risk  of 
deterioration.  In  the  statistical  quality  control  literature  related  to  this 
problem,  these  rolls  correspond  to  defective  items  in  a lot  of  manufactured 
goods.  For  brevity  such  rolls  will  be  designated  as  "defective"  rolls  in  this 
report.  If  the  individual  defective  rolls  are  isolated  and  randomly 
distributed  throughout  the  population,  then  a survey  sample  has  a very  small 
chance  of  solving  the  problem  that  they  represent,  for  this  would  require  that 
the  sample  identify  all  of  the  defective  rolls  in  the  population  in  order  for 
appropriate  action  to  be  taken  in  each  instance.  Fortunately,  the  practical 
situation  is  more  favorable,  because  a single  defective  roll  identified  in  the 
survey  will  typically  lead  to  the  discovery  of  others  like  it  in  the  follow-up 
process.  It  is  fruitful  to  model  this  aspect  of  the  survey. 

As  described  in  Section  2,  the  roils  of  microfilm  can  be  categorized  in 
several  ways  and  conceptually  divided  into  "logical  subgroups"  that  have 
certain  properties  in  common.  As  a starting  point,  consider  the  case  in  which 
all  of  the  defective  rolls  of  microfilm  belong  to  a single  identifiable  group. 
It  is  not  necessary  to  assume  that  the  group  consists  exclusively  of  defective 
rolls;  the  model  and  mathematical  development  to  follow  only  depend  on  the 
number  of  defective  rolls  contained  in  the  group.  For  this  reason,  and  for 
simplicity,  the  following  development  will  focus  on  only  the  defective  rolls 
in  a group,  often  speaking  as  though  the  group  consists  entirely  of  defective 
rolls . 


Conceptually,  if  the  sample  inspection  procedure  identifies  at  least  one 
defective  roil  of  film,  then  the  follow-up  phase  of  the  survey  project  can 
examine  the  entirety  of  the  logical  group  to  which  the  defective  roll  belongs 
and  appropriately  respond  to  the  identified  problem  throughout  that  group.  In 
a sense,  the  main  job  of  the  survey,  then,  is  to  discover  any  such  problems 
that  exist  within  the  population.  The  sample  size  required  to  satisfy  the 
needs  of  the  survey  can  be  determined  by  specifying  the  probability  that  a 
group  containing  at  least  a specified  number  of  defective  rolls  will  be 
discovered . 
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The  following  symbols  are  used  to  denote  the  quantities  of  interest: 
n = sample  size  (number  of  rolls  inspected) 

N = population  size  (total  number  of  rolls  in  storage  at  NUS) 

f = n/N,  the  "sampling  fraction" 

A = number  of  defective  rolls  in  the  population 

Pjj  = probability  of  detection  of  the  problem  represented  by  the 
group  of  defective  rolls  (i.e.  the  probability  that  one 
or  more  defective  rolls  is  selected  into  the  random 
sample) 

Derivations  of  the  formulas  discussed  in  this  section  and  in  section  4 are 
given  in  Appendix  3 . 

Under  the  assumption  of  simple  random  sampling,  the  probability  of 
detection  is 

Pj  = 1 - (1  - f)*.  (3.1) 

In  this  equation  a key  role  is  played  by  the  proportion  of  the  population 
sampled,  f = n/N,  which  is  called  the  sampling  fraction  in  statistical 
literature , 

Figure  1 shows  a plot  of  versus  the  sampling  fraction  for  several 
values  of  A.  The  plot  illustrates  that  the  probability  of  detection  increases 
as  the  sampling  fraction  increases  and  as  the  number  of  defective  rolls 
increases . 
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PROBABILITY  OF  DETECTION, 


f 


Figure  1.  Plot  of  probability  of  detection,  P^j,  vs.  sampling 
fraction,  f,  for  various  numbers  of  defective  rolls,  A.  For  a fixed 
number  of  defectives,  the  probability  of  detection  increases  as  the 
sampling  fraction  increases.  For  a given  sampling  fraction,  P^^ 
decreases  as  the  number  of  defectives  in  the  population  decreases . 
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If  the  values  of  and  A are  specified,  it  follows  from  equation  (3.1) 
that  the  sampling  fraction  is  given  by 

f = 1 - (1  - (3.2) 

The  interpretation  of  equation  (3.2)  is  that  the  appropriate  sample  size, 
n,  depends  on  the  value  of  the  population  size,  N.  In  fact,  it  says  that  n is 
directly  proportional  to  N,  the  proportionality  factor  being  determined  by 
and  A as  specified  by  (3.2).  Therefore,  the  values  of  P^  and  A determine  the 
sampling  fraction  and  so  the  required  sample  size,  n,  can  be  calculated  as  a 
fixed  fraction  of  the  population  size,  N. 

Table  3.1  gives  the  values  of  the  sampling  fraction  for  a selection  of 
values  of  P^  and  A. 


Table  3.1. 

Sampling  Fraction,  f,  for  Selected  Values  of  P^  and  A,  Under  the 
Assumption  That  All  Defective  Roils  of  Microfilm  Belong  to  a 

Single  Identifiable  Group 


No.  of  Defective 
Rolls,  A 

Probability 

of  Detection,  P^ 

50% 

75% 

90% 

95% 

99% 

30 

.023 

.045 

.074 

.095 

.142 

50 

.014 

.027 

.045 

.058 

.088 

100 

.0069 

.014 

.023 

.030 

.045 

200 

.0035 

.0069 

.011 

.015 

.023 

300 

.0023 

.0046 

.0076 

.0099 

.015 

600 

.0012 

.0023 

.0038 

.0050 

.0076 

1000 

.00069 

.0014 

.0023 

.0030 

.0046 

3000 

.00023 

.00046 

.00077 

.0010 

.0015 

As  an  example,  suppose  we  wish  to  insure  that  the  probability  of 
detection  is  at  least  95%  for  a group  containing  200  defective  rolls.  From 
Table  3.1,  the  appropriate  value  of  the  sampling  fraction  is  f = .015,  or 
1.5%.  Thus,  if  the  population  consists  of  a total  of  N = 100,000  rolls  of 
microfilm,  the  sample  size  should  be  1.5%  of  100,000  or  n = 1500.  Similarly, 
for  a population  of  N = 40,000  rolls,  the  sample  size  should  be  n = 600  (which 
is  1.5%  of  40,000.) 
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4.  Sample  Size  When  Defective  Rolls  Belong  to  Several  Logical  Groups 


In  practice,  it  is  unlikely  that  the  defective  rolls  of  microfilm  in  the 
population  will  all  belong  to  a single  identifiable  group.  If  the  defective 
rolls  are  distributed  across  several  groups,  the  follow-up  phase  of  the 
inspection  plan  will  locate  all  of  them  only  if  at  least  one  member  of  each 
and  every  defective  group  is  identified  by  the  sample  inspection.  It  seems 
intuitively  clear  that  a larger  sample  size  will  be  needed  to  achieve  a 
reasonable  probability  of  detecting  all  of  the  logical  groups  containing 
defectives  in  this  case. 

One  simple  way  to  extend  the  probability  model  to  analyze  the  situation 
involving  several  groups  is  to  assume  that  the  groups  are  all  equal  in  size. 
Specifically,  let 

k = number  of  groups  containing  defective  rolls 
A/k  = number  of  defective  rolls  per  group 

(k)  = probability  of  detecting  all  of  the  defective  groups 

(i.e.,  the  probability  that  one  or  more  members  of  each 
defective  group  are  included  in  the  sample) 

In  this  case,  it  can  be  shown  (section  A3 . 2 of  Appendix  3)  that  the 
probability  of  simultaneously  detecting  ail  of  the  defective  groups  is  given 
(approximately)  by 

Pd(k)  - [1  - (1  - f)^/^]^  • 

Again  the  sampling  fraction,  f = n/N , plays  a key  role  in  determining  the 
probability  of  detection.  The  plot  of  P^ (k)  versus  f in  Figure  2 illustrates 
how,  for  a fixed  number  of  defective  roils  in  the  population  (A  = 300  is 
shown) , the  probability  of  simultaneous  detection  decreases  as  the  number  of 
groups  increases.  Putting  it  another  way,  the  sampling  fraction,  f,  must  be 
substantially  larger  if  a given  number  of  defective  rolls  are  distributed 
across  several  groups  compared  to  the  case  of  all  defectives  belonging  to  a 
single  group. 

As  was  true  in  the  case  of  a single  group,  the  equation  for  probability 
of  detection  (4.1)  can  be  solved  for  the  sampling  fraction  in  terms  of  the 
other  parameters.  The  solution  is 

f - 1 - [1  - Pd(k)^^^]^"^^  . (4.2) 

Formula  (4.2)  can  be  used  to  determine  the  appropriate  sampling  fraction  for 
specified  values  of  A,  k,  and  P^j  (k)  . From  the  sampling  fraction  and  the 
population  size,  N,  the  sample  size,  n,  can  be  found  by  multiplying  the 
population  size  by  the  sampling  fraction;  n = f x N. 
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PROB.  OF  SIMULTANEOUS  DETECTION, 
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Figure  2.  Plot  of  probability  of  detection,  P(j^(k)  , vs.  sampling 
fraction,  f,  showing  dependence  on  number  of  groups,  k.  The  plot  is 
constructed  assuming  that  a total  of  300  defective  rolls  are  equally 
distributed  among  k groups.  The  probability  of  simultaneously 
detecting  all  k groups  in  a single  survey  decreases  as  the  number  of 
groups  increases. 
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5.  Discussion  of  the  Sample  Size  Criteria 


Current  regulations  relating  to  inspection  of  microform  records  (NARA, 
1985,  section  1230.22)  derive  from  a rule  that  the  sample  shall  constitute  a 
1%  sample  of  the  population.  This  rule,  which  was  originally  proposed  on 
intuitive  grounds,  and  without  a rigorous  statistical  basis,  is  consistent 
with  the  model  and  theoretical  development  outlined  in  Sections  3 and  4 of 
this  report.  Additional  detail  on  the  nature  of  these  regulations  is 
contained  in  Appendix  2,  where  selected  portions  of  a 1986  draft  revision  to 
the  regulations  have  been  reproduced. 

The  theoretical  development  in  this  report  provides  a statistical 
justification  for  a sample  size  rule  based  on  setting  a fixed  sampling 
fraction  that  will  be  used  with  populations  of  varying  sizes.  Equations  (3.1) 
and  (4.1)  provide  a statistical  interpretation  of  the  merits  of  any  sampling 
plan  specified  by  a given  sampling  fraction.  Candidate  sampling  plans  can  be 
evaluated  in  terms  of  the  probability  of  detecting  a set  of  A defective  items, 
where  the  number  A is  chosen  to  be  relevant  to  a particular  situation.  Table 
5.1  presents  the  probabilities  of  detection  for  a range  of  sampling  fractions 
and  group  sizes.  The  table  contains  separate  entries  corresponding  to  the 
assumptions  that  all  defective  items  belong  to  one  group  [using  equation 
(3.1)]  or  that  the  defective  items  are  distributed  across  5 or  10  groups 
[using  (4.1)]. 

Table  5.1  shows,  for  example,  that  if  there  are  A=100  defective  rolls  of 
film  in  a population,  a 5%  random  sample  has  a 99.4%  chance  of  detecting  at 
least  one  of  them.  If  those  100  defectives  all  belong  to  a single  logical 
group,  they  could  all  be  found  (and  presumably  fixed)  by  an  follow-up 
inspection  of  the  100%  of  group  to  which  the  sample  defective(s)  belong.  If 
the  100  defective  rolls  are  distributed  in  5 distinct  (but  individually 
identifiable)  groups  (of  20  each) , that  same  5%  random  sample  of  the 
population  has  only  a 10.9%  chance  of  detecting  all  5 groups.  Further,  if  the 
100  defectives  are  situated  in  10  groups  (of  10  each) , then  the  5%  random 
sample  has  only  a 0.01%  chance  of  locating  at  least  one  member  of  each  of  the 
10  groups. 

Table  5.2  is  an  extension  of  Table  3.1  and  gives  values  of  the  sampling 
fraction  corresponding  to  various  values  of  A,  k and  (k) . As  an  example  of 
the  use  of  Table  5.2,  suppose  it  is  desired  to  insure  that  the  probability  of 
detection  is  at  least  99%  for  simultaneous  detection  of  5 groups  of 
defectives,  each  of  which  contains  600  defectives.  For  this  example,  we  wish 
to  find  the  sampling  fraction,  f,  in  Table  5.2  corresponding  to  k = 5 groups, 
probability  of  detection  (k)  = 99%,  and  total  number  of  defective  rolls 
A = 5x600  = 3000.  The  tabulated  value  of  f is  .010,  which  implies  that  a 1% 
sample  of  the  population  will  be  required.  If  the  population  contains  124,000 
rolls  of  microfilm,  for  example,  the  required  sample  size  would  be  1%  of 
124,000  or  1,240  rolls  to  be  inspected. 
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No. 

Gro 

k 

1 

1 

1 

1 

1 

1 

5 

5 

5 

5 

5 

5 

10 

10 

10 

10 

10 

10 


Table  5.1. 

Probability  of  Detection,  P^j  (k)  in  Percent, 
for  Various  Values  of  A,  f,  and  k. 


Number  of  Sampling  Fraction,  f = n/N 

Defective  

Rolls,  A .001  .005  .01  .05  .10 


30 

3.0% 

14.0% 

26.0% 

78.5% 

95.8% 

50 

4.9 

22.2 

39.5 

92.3 

99.5 

100 

9.5 

39.4 

63.4 

99.4 

99.9+ 

300 

25.9 

77.8 

95.1 

99.9+ 

99.9+ 

1000 

63.2 

99.3 

99.9+ 

99.9+ 

99.9+ 

3000 

95.0 

99.9+ 

99.9+ 

99.9+ 

99.9+ 

30 

0.0 

0.0 

0.0 

0.1 

2.3 

50 

0.0 

0.0 

0.0 

1.0 

11.7 

100 

0.0 

0.0 

0.02 

10.9 

52.3 

300 

0.0 

0.1 

1.9 

79.0 

99.1 

1000 

0.02 

10.2 

48.7 

99.9+ 

99.9+ 

3000 

1.9 

77.6 

98.8 

99.9+ 

99.9+ 

30 

0.0 

0.0 

0.0 

0.0 

0.0 

50 

0.0 

0.0 

0.0 

0.0 

0.01 

100 

0.0 

0.0 

0.0 

0.01 

1.4 

300 

0.0 

0.0 

0.0 

8.9 

64.8 

1000 

0.0 

0.01 

1.0 

94.2 

99.9+ 

3000 

0.0 

8.1 

60.5 

99.9+ 

99.9+ 
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No. 

Gro 

k 

1 

1 

1 

1 

1 

1 

1 

1 

5 

5 

5 

5 

5 

5 

5 

5 

10 

10 

10 

10 

10 

10 

10 

10 
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Table  5.2. 

Sampling  Fraction,  f = n/N, 
for  Various  Values  of  A,  k,  and  (k) . 


Number  of  Probability  of  Detection,  P^ (k) 

Defective  

Rolls,  A 50%  75%  90%  95%  99% 


30 

.023 

.045 

.074 

.095 

.142 

50 

.014 

.027 

.045 

.058 

.088 

100 

.0069 

.014 

.023 

.030 

.045 

200 

.0035 

.0069 

.011 

.015 

.023 

300 

.0023 

.0046 

.0076 

.0099 

.015 

600 

.0012 

.0023 

.0038 

.0050 

.0076 

1000 

.00069 

.0014 

.0023 

.0030 

.0046 

3000 

.00023 

.00046 

.00077 

.0010 

.0015 

30 

.29 

.38 

.48 

.53 

.64 

50 

.18 

.25 

.32 

.37 

.46 

100 

.097 

.13 

.18 

.20 

.27 

200 

.050 

.070 

.092 

.11 

.14 

300 

.034 

.047 

.062 

.074 

.098 

600 

.017 

.024 

.032 

.038 

.050 

1000 

.010 

.014 

.019 

.023 

.031 

3000 

.0034 

.0048 

.0064 

.0076 

.010 

30 

.59 

.70 

.78 

.83 

.90 

50 

.42 

.51 

.60 

.65 

.75 

100 

.24 

.30 

.37 

.41 

. .50 

200 

.13 

.16 

.20 

.23 

.29 

300 

.086 

.11 

.14 

.16 

.21 

600 

.044 

.058 

.073 

.084 

.11 

1000 

.027 

.035 

.045 

.051 

.067 

3000 

.0090 

.012 

.015 

.017 

.023 
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6.  Stratification 


Stratification  refers  to  the  process  of  dividing  the  population  into  non- 
overlapping subpopulations,  or  strata,  that  will  be  treated  separately  in  the 
sample  survey.  Since  the  stratification  is  done  before  sampling,  the  frame 
must  contain  enough  information  to  unequivocally  assign  each  sample  unit  to 
one  and  only  one  stratum  before  sampling. 

There  are  four  major  reasons  for  stratification  that  might  apply  to  this 
survey  of  microfilm:  administrative  convenience  (including  the  need  to 

prepare  separate  summary  reports  for  each  stratum),  statistical  efficiency, 
existence  of  strata  having  significantly  different  historical  or  monetary 
value,  and  allocation  of  resources  for  future  surveys. 

Administrative  convenience.  It  is  often  useful  to  stratify  a large 
population  into  smaller  pieces  so  that  survey  work  can  be  organized 
independently  in  each  stratum.  An  additional  advantage  is  that  summary 
reports  and  any  statistical  analyses  of  the  data  from  each  stratum  can  be 
carried  out  separately. 

Statistical  efficiency.  The  theory  outlined  in  sections  3 and  4 implies 
that  the  probability  of  detection  depends  on  A,  the  total  number  of  defective 
rolls  in  the  subpopulation  (or  stratum) , and  on  k,  the  number  of  identifiable 
groups  to  which  the  defective  rolls  belong.  This  implies  that  an  efficient 
stratification  will  divide  the  population  according  to  expected  values  of  A 
and  k. 

For  example,  one  stratum  might  consist  of  roils  for  which  the  sizes  of 
the  identifiable  subgroups  (possible  values  of  A if  the  rolls  turn  out  to  be 
defective)  are  all  relatively  large  compared  to  another  stratum  consisting  of 
smaller  group  sizes.  Organizing  the  population  into  strata  in  this  way  would 
have  the  advantage  of  making  it  possible  to  sample  a larger  fraction  of  the 
stratum  consisting  of  smaller  groups.  Similarly,  if  a stratum  could  be  formed 
in  which  it  was  expected  that  at  most  k=l  group  could  contain  defective  rolls, 
a relatively  small  sampling  fraction  would  suffice,  saving  a heavier  sampling 
effort  for  another  stratum  in  which  k>l  is  expected. 

Strata  of  significantly  different  historical  or  monetary  value.  A 
stratum  consisting  of  relatively  more  valuable  microfilm  can  be  sampled  more 
intensively  than  a less  valuable  stratum.  For  example,  only  camera  negative 
film  was  sampled  in  the  1984  survey  conducted  at  the  National  Archives. 

Planning  for  future  surveys.  If  experience  with  the  population  suggests 
that  one  stratum  is  significantly  more  at  risk  than  another,  a stratified 
approach  provides  for  the  possibility  of  sampling  the  lower  risk  stratum  less 
often  or  less  intensively  in  a periodic  inspection  program.  This  approach  was 
described  in  the  proposed  revision  to  the  Federal  Property  Management 
Regulations  on  Micrographics  Records  Management  that  the  author  helped  prepare 
in  November,  1986.  Relevant  portions  of  that  document  are  reproduced  in 
Appendix  2 . 
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7.  Sample  Selection  Procedure 

In  this  section,  it  is  assumed  that  a suitable  list  frame,  as  described 
in  Section  2,  is  available  for  use  in  sample  selection.  The  sampling 
procedure  that  will  be  described  is  a systematic  sampling  scheme  which  is 
practical  to  implement  and  which  can  be  used  for  list  frames  having  entries 
that  represent  more  than  one  roll  of  microfilm.  The  details  of  the 
recommended  sampling  procedure  will  be  described  and  illustrated  by  working 
through  a hypothetical  example. 

Step  1.  By  referring  to  Table  5.1  and/or  Table  5.2,  choose  the  appropriate 
sampling  fraction,  f , to  achieve  the  desired  probability  of  detection,  , for 
relevant  values  of  A,  the  assumed  number  of  defectives  in  the  population,  and 
k,  the  assumed  number  of  equal  groups  to  which  the  A defectives  belong. 

Example:  Suppose  we  want  to  design  a survey  so  that  there  will  be  a 

90%  chance  of  detecting  the  presence  of  each  of  10  groups  that 
contain  at  least  100  defectives,  should  such  groups  exist.  In  this 
example  we  want  = .90,  for  k = 10  groups  and  A = 100x10  = 1000 

total  defectives.  From  Table  5.2,  we  find  that  a sampling  fraction 

of  f = 0.0446,  or  4,46%  of  the  population,  will  be  required  to 
guarantee  a 90%  chance  of  detecting  at  least  one  defective  roll  from 
each  of  the  assumed  10  groups  of  defectives. 

Step  2.  Calculate  the  sampling  interval,  S,  by  the  formula  S = 1/f  and  round 
down  to  the  nearest  whole  number.  The  sample  for  inspection  will  be  chosen  as 
"every  Sth"  unit  in  the  population. 

Example:  Given  that  f = 0.0446,  we  find  S = 1/(0.0446)  = 22.42. 

Rounding  down  to  the  nearest  whole  number  yields  S = 22,  Thus  the 
sample  will  consist  of  every  22nd  unit  from  the  population. 

Step  3.  Choose  a random  starting  number,  R,  between  1 and  S.  The  first  unit 
selected  will  be  serial  number  R and  the  rest  of  the  sample  will  be  chosen  as 
every  Sth  unit  after  that. 

Example:  Consulting  a table  of  random  numbers,  we  locate  a "random" 

starting  point  by  placing  a finger  on  the  table  without  looking. 

Then,  scanning  down  the  located  column  (or  across  the  row  or  moving 
diagonally  — it  doesn't  matter  since  the  table  is  completely 
random)  the  first  (two-digit)  number  encountered  between  01  and  22 
(=S)  is  09,  say.  Thus  the  random  starting  number  will  be  R = 9. 

The  sample  rolls  of  microfilm  will  consist  of  serial  numbers  9, 

31  (=9+22),  53  (=31+22),  75  (=53+22),  etc. 
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Step  4.  Assign  "serial  numbers"  to  all  the  rolls  of  microfilm  in  the 
population  and  select  the  sample  rolls. 

Example:  Table  7.1  below  represents  a hypothetical  population  which 

will  be  used  to  illustrate  the  sample  selection  procedure.  The 
population  consists  of  68146  rolls  of  microfilm  represented  in  a 
list  frame  with  5000  entries.  The  population  is  to  be  sampled  using 
random  start  R=9  and  sampling  interval  S=22. 


Table  7.1 

Hypothetical  Population  for  Example 


Number  of 

Serial  Numbers 

List  Entry 

Rolls  Per 

Serial 

Numbers 

for  Selected 

Number 

Entry 

Assigned 

Sample  Rolls 

1 

23 

1 

- 23 

9 

2 

11 

24 

- 34 

31 

3 

7 

35 

- 41 

- 

4 

19 

42 

- 60 

53 

5 

6 

61 

- 66 

- 

6 

12 

67 

- 79 

75 

7 

14 

80 

- 94 

- 

8 

28 

95 

- 123 

97,  119 

9 

5 

124 

- 128 

- 

10 

13 

129 

- 132 

- 

5000 

12 

68134  - 68146 

68143 

Total  = 68146 

Total  number  of  rolls 

roils  in 

selected  for  the 

population 

sample  would  be  3097. 

Notice  that  two  rolls  (serial  numbers  97  and  119)  are  to  be 
selected  from  list  entry  number  8.  This  will  happen  occasionally 
for  list  entries  that  correspond  to  more  than  22  roils  (or,  more 
generally,  S rolls)  of  microfilm. 

In  assigning  the  serial  numbers,  it  is  not  necessary  to  actually  number 
every  roll  in  the  population.  The  serial  numbers  are  used  simply  as  a 
conceptual  device  to  keep  track  of  the  varying  numbers  of  rolls  per  list  entry 
and,  in  effect,  to  insure  that  every  roll  in  the  population  has  a equal  chance 
of  being  selected  into  the  sample. 
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Step  5.  Select  the  individual  microfilm  rolls  to^^  be  inspected  by  making 
random  selections  from  the  list  entries  identified  in  step  4. 

Example:  The  procedure  described  in  Table  7.1  results  in  selection 

of  a set  of  list  entries  corresponding  to  the  sample  of  microfilm 
rolls  to  be  inspected.  The  final  selection  of  the  individual 
microfilm  rolls  from  the  selected  list  entries  is  accomplished  by 
simple  random  sampling,  as  follows:  First,  physically  locate  all  of 

the  microfilm  rolls  that  belong  a chosen  list  entry  and  count  them. 
It  may  be  expected  that  the  actual  number  of  rolls  for  a list  entry 
will  not  always  be  exactly  the  number  predicted.  In  the 
hypothetical  example,  we  may  imagine  that  list  entry  #1  is  actually 
found  to  contain  26  rolls,  rather  than  the  23  rolls  that  were 
assumed  when  Table  7.1  was  constructed.  The  sample  selection  is 
completed  by  drawing  one  roll  at  random  from  the  actual  number  of 
rolls  found,  using  a table  of  random  numbers  or  a computerized 
random  number  generator.  In  the  example,  a random  number  between  1 
and  2^  would  be  drawn  and  the  corresponding  roll  selected  for 
inspection.  Continuing  the  example,  note  that  two  rolls  would  be 
drawn  at  random  from  the  group  corresponding  to  list  entry  #8 
because  two  serial  numbers  were  identified  as  belonging  to  that 
group . 
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8.  Use  of  a Geographic  Frame  for  Sampling 


The  terra  "geographic  frarae"  is  used  here  to  raean  a conceptual  frarae  that 
identifies  each  roll  of  raicrof iliii  in  the  population  with  its  physical  storage 
location.  A geographic  frarae  can  be  used  to  obtain  reasonably  coraplete 
coverage  of  the  population,  with  the  exception  that  if  sorae  group  of  film  is 
heavily  used,  iteras  from  that  group  would  tend  to  be  under-represented  in  the 
saraple  because  iteras  in  use  at  the  tirae  of  the  survey  would  be  unavailable  for 
inspection.  This  type  of  frarae  is  less  desirable  than  a list  because  it  is 
less  stable  over  tirae  and  does  not  lend  itself  to  creating  and  raaintaining 
records  of  data  obtained  from  repeated  surveys.  However,  if  a list  frarae  is 
not  available  and  can  not  be  constructed  in  a timely  and  cost-effective 
manner,  useful  results  still  can  be  obtained  from  a survey  conducted  by  a 
geographic  frame . 

The  procedure  for  sampling  from  a geographic  frame  follows  essentially 
the  same  steps  described  in  Section  7 of  this  report,  with  appropriate 
adaptations.  In  particular,  the  "list  entries"  are  replaced  by  convenient 
physical  storage  units,  such  as  storage  shelves  or  drawers. 

To  begin,  the  sampling  fraction,  sampling  interval,  and  random  starting 
number  are  chosen  following  exactly  the  same  methods  described  in  steps  1,  2, 
and  3 in  Section  7.  The  number  of  rolls  of  microfilm  on  each  shelf  (using 
shelf  in  place  of  list  entry)  must  be  determined  and  used  to  assign  serial 
numbers  to  the  rolls  as  was  illustrated  in  Table  7.1.  The  shelves  are 
selected  systematically  in  the  same  way  list  entries  were  selected  in  Table 
7.1.  Similarly,  the  number  of  rolls  that  will  be  inspected  on  a given  shelf 
is  determined  by  the  number  of  selected  serial  numbers  that  happen  to  belong 
to  that  shelf.  Finally,  the  individual  rolls  are  chosen  at  random  from  the 
rolls  on  selected  shelves  following  the  method  described  in  step  5 of 
Section  7. 
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9.  Other  Practical  Considerations  for  Sampling 


Pilot  Test.  Any  survey  that  is  planned  will  need  a well-defined  data 
collection  procedure  and  some  sort  of  form  or  questionnaire  for  recording  the 
field  data.  These  procedures  should  be  tested  by  use  in  a realistic  pilot 
survey  of  some  small  portion  of  the  population.  This  test  need  not  be  large  — 
inspection  of  10  or  20  rolls  of  microfilm  should  suffice  — but  it  should  be 
conducted  by  one  or  more  of  the  inspectors  that  will  be  involved  in  the  full- 
scale  survey.  A pilot  test  will  almost  always  lead  to  improvements  in  the 
data  recording  form(s)  and  often  uncovers  serious  deficiencies  in  the  planned 
survey  procedures. 

Replication . The  easiest,  and  often  most  convincing,  way  to  evaluate  the 
statistical  uncertainty  in  a survey  is  to  repeat  it  and  compare  results.  In 
fact,  replication  can  be  built  into  a survey  by  simply  dividing  the  workload 
into  approximately  equal  pieces  and  conducting  the  survey  in  parallel  and 
independently  (e.g.  different  inspectors,  equipment,  data  summarization)  on 
each  piece.  This  method  of  organizing  survey  work,  using  about  4 to  10 
subsamples,  has  been  used  extensively  and  very  effectively  in  practical  work 
in  many  fields.  Examples  are  described  by  Deming  (1960,  Chapter  6)  and  Sudman 
(1976,  pp.  171-178). 

As  a concrete  example,  suppose  that  it  is  desired  to  take  a 2%  sample 
(i.e.  1 in  50)  of  a population.  The  survey  could  be  divided  into  4 subsamples 
by  having  each  of  4 inspectors  conduct  a 1 in  200  sample  (of  0.5%  of  the 
population.)  To  make  the  4 subsamples  as  comparable  as  possible,  it  would  be 
well  to  use  so-called  "interpenetrating"  subsamples  by  having  each  of  the  4 
inspectors  choose  systematic  samples  with  different  random  starts,  but 
counting  from  the  same  point  of  origin  in  the  population. 
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Appendix  1 . 


SUMMARY  OF  PHYSICAL  LAYOUT  AT  NUS  IN  BOYERS,  PA 


This  appendix  is  based  on  notes  taken  by  Alan  Caimes  during  an  August,  1985, 
visit  to  National  Underground  Storage,  Inc.  in  Boyers,  Pennsylvania.  The  raw 
summary  data  are  reproduced  in  Table  Al . 1 . A sketch  of  the  physical  layout  of 
the  facility  is  given  in  Figure  3. 


Observations  on  Physical  Layout  at  NUS 

• There  are  about  49,654  rolls  of  16  mm  film  (119  million  frames) 

+ 74,326  rolls  of  35  mm  film  ( 89  million  frames) 

for  a total  of  about  123,980  rolls  or  208  million  frames 

in  the  entire  storage  area  at  NUS. 


• These  rolls  take  up  only  about  56  compartments  (10  shelves  per 
compartment . ) 

• The  number  of  rolls  per  compartment  is  quite  variable: 

- many  have  3000  rolls  per  compartment 

- many  have  1600  " " " 

- many  have  2112  " " " 

- Max  = 5416 

- Min  = 100 

- Mean  = 2200 

• Most  compartments  have  either  16  mm  or  35  mm  film,  but  not  both. 
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Table  Al . 1 

Rough  Count  of  Number  of  Roils  of  Stored  Microfilm  at  NUS 
(based  on  A.  Calmes'  notes  from  August,  1985,  visit  to  Boyers,  PA) 


Aisle 

Compartment 

No.  of  100 
16  mm 

foot  rolls* 
35  mm 

1 

2 

3000 

3 

- 

3000 

4 

- 

3000 

5 

- 

3000 

6 

- 

3000 

7 

- 

3000 

8 

- 

3000 

2 

3 

- 

3000 

4 

- 

1660 

5 

- 

2954 

6 

- 

3780 

7 

- 

3780 

8 

- 

3780 

3 

3 

- 

3600 

4 

- 

3600 

5 

- 

3600 

6 

1700 

1040 

7 

- 

1600 

8 

- 

1600 

4 

2 

- 

1600 

3 

- 

1600 

4 

- 

1600 

5 

- 

1600 

6 

680 

2200 

7 

544 

1280 

8 

- 

1600 

5 

3 

- 

1600 

4 

- 

1600 

5 

1520 

800 

6 

3040 

- 

7 

3040 

- 

8 

2702 

- 

9 

2112 

- 

10 

2112 
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Table  Al . 1 , Continued 


Aisle 

Compartment 

No.  of  100 
16  mm 

foot  rolls* 
35  mm 

6 

2 

2112 

3 

2112 

- 

4 

2112 

- 

5 

2112 

- 

6 

2112 

- 

7 

2112 

- 

8 

2112 

- 

9 

5416 

- 

7 

2 

2396 

- 

3 

2720 

- 

4 

2176 

264 

5 

304 

1408 

6 

1976 

480 

7 

2432 

160 

8 

- 

1440 

9 

- 

100 

Subtotals 

49,654 

74,326 

Grand  Total 

123,980 

rolls 

* counts  each  1000  foot  roll  as  10  x 100  foot  rolls. 
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LAYOUT  OF  NUS  FACILITY 
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Remainder  is 
empty  shelving. 


Aisle  1,  Compartment  2 through 
Aisle  7,  Compartment  9 are  in  use. 


Figure  3.  Sketch  of  the  NARA  underground  storage  facility  at  Nat 
Underground  Storage,  Inc.,  Boyers,  PA.  August,  1985. 
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Appendix  2 . 


SELECTED  PORTIONS  OF  THE  NOVEMBER,  1986,  DRAFT  REVISION  TO 

36  CFR  Part  1230,  Subpart  B 

Standards  for  the  Storage,  Use  and  Disposition  of  Microform  Records. 
Section  1230.22 

(a)  Permanent  Records 
(1)  Unstratified  samples. 

(i)  Master  films  of  permanent  and  unscheduled  records  microfilmed  to 
dispose  of  the  original  record  shall  be  inspected  on  a 3-year  cycle.  At  each 
cycle,  the  inspection  shall  be  made  on  a randomly  selected  sample  consisting 
of  1000  microform  units,  or  1%  of  the  total  number  of  microform  units  in  the 
collection,  whichever  is  smaller.  The  term  "microform  unit"  refers  to  a 
single  [100']  roll  of  microfilm,  a microfiche,  or  similar  appropriate  unit  for 
inspection . 

(ii)  To  facilitate  inspection,  an  inventory  of  microfilm  must  be 
maintained,  listing  each  microform  series/publication  by  production  date, 
producer,  processor,  format,  and  results  of  previous  inspections. 

(iii)  The  elements  of  the  inspection  shall  consist  of  (1)  an  inspection 
for  aging  blemishes  following  the  guidelines  in  the  AIIM/MS  HB96  [AIIM  Special 
Interest  Publication  #34,  Association  for  Information  and  Image  Management, 
Silver  Spring,  Maryland];  (2)  a rereading  of  resolution  targets;  (3)  a 
remeasurement  of  density;  and  (4)  a certification  of  the  environmental 
conditions  under  which  the  microforms  are  stored,  as  specified  in  sec.  1230.20 
(a). 


(iv)  An  inspection  log  shall  be  maintained.  Information  to  be  contained 
in  the  log  shall  include  (1)  a complete  description  of  ail  records  tested 
(title;  number  or  identifier  for  each  unit  of  film;  and  inclusive  dates, 
names,  or  other  data  identifying  the  records  on  the  unit  of  film);  (2)  the 
date  of  inspection;  (3)  the  elements  of  inspection;  (4)  the  defects  uncovered; 
and  (5)  the  corrective  action  taken.  In  addition,  the  log  shall  contain  the 
results  of  all  archival  film  tests  required  by  sec.  1230.14. 

(v)  The  results  of  the  inspection  shall  be  reported  to  the  Office  of 
Records  Administration,  National  Archives  (NI),  Washington,  DC  20408,  30  days 
after  the  inspection  is  completed.  Reports  shall  include  (1)  the  quantity  of 
microform  records  on  hand,  i.e.,  number  of  rolls,  number  of  microfiche,  etc.; 
(2)  the  quantity  of  microforms  inspected;  (3)  the  condition  of  the  microforms; 
(4)  any  defects  discovered;  and  (5)  corrective  action  taken.  A copy  of  the 
inspection  report  shall  be  stored  with  the  microfilm  and  have  the  same 
retention  period  as  the  microfilm. 
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(2)  Stratified  samples 


(i)  When  the  records  required  by  sec.  1230.22(b)  are  maintained,  it  may 

be  possible  to  divide  a microform  collection  into  two  strata,  with  one  stratum 
having  an  appreciably  lower  risk  of  deterioration  than  the  other.  This  is 
determined  by  analysis  of  the  information  derived  from  previous  inspections. 
One  stratum  consists  of  microform  series/publications  which  showed  no  signs  of 
deterioration  in  past  inspections.  In  such  circumstances,  it  will  be 
sufficient  to  inspect  the  latter,  lower  risk  stratum  only  in  alternate 
inspection  cycles,  i.e.,  every  6 years,  while  continuing  to  inspect  the 
former,  higher  risk  stratum  every  3 years.  When  the  population  is  stratified 
in  this  way,  the  sample  size  for  inspection  may,  in  some  cases,  be  reduced  in 
alternate  inspection  cycles,  as  follows:  when  the  inspection  schedule  calls 

for  inspection  of  only  the  higher  risk  stratum,  the  required  sample  size  may 
be  computed  as  the  smaller  of  1000  units  or  1%  of  the  high  risk  stratum  only. 

(ii)  When  stratification  is  used  in  the  inspection  program,  the  stratum 
definitions  must  be  well  documented,  including  the  reasons  used  to  determine 
which  microform  records  would  be  placed  in  the  respective  strata.  This 
information  must  be  included  in  the  inspection  report  submitted  to  the 
National  Archives. 

(iii)  This  stratification  of  the  collection  shall  not  be  used  for  an 
inspection  cycle  until  an  inspection  of  the  entire  collection  has  been 
conducted  at  least  twice.  In  any  case,  the  inspection  procedures  shall  follow 
the  unstratified  plan  described  in  sec.  1230.  22  (a)  above  at  least  every  6 
years . 
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Appendix  3 . 


MATHEMATICAL  DEVELOPMENT  OF  FORMULAS  FOR 


A3 . 1 Derivation  of  Formulas  for  Probability  of  Detection 

The  formulas  for  the  probability  of  detection,  P^ , are  derived  as 
follows . 

Define  the  random  variable  X to  be  the  number  of  defective  rolls  of 
microfilm  drawn  into  a simple  random  sample  of  size  n.  In  summary,  the 
essential  quantities  are  denoted  as: 

N = population  size  (total  number  of  rolls) 

n = sample  size 

f = n/N,  the  sampling  fraction 

A = number  of  defective  roils  in  the  population 

X = number  of  defective  roils  drawn  in  a simple  random  sample 
(without  placement)  of  size  n 


The  probability  law  of  the  random  variable  X is  the  Hypergeometric 
Distribution  (Cochran,  1977,  pp , 55-57).  Thus  P^  can  be  calculated  as 


P 


d 


^ N-A  1 


(A3.1) 


The  practical  range  of  values  of  A will  typically  be  less  than  10%  of  the 
population  size,  or  else  a survey  sample  would  not  be  needed  to  locate 
defective  roils.  In  this  case,  A/N  is  less  than  0.1  and  the  probability  in 
(A3.1)  can  be  well-approximated  by  use  of  the  binomial  distribution 
(Schilling,  1982,  pp . 64-65).  Using  the  so-called  f-binomiai  approximation, 
leads  to  the  expression 

Pj  = 1 - (1  - f)^  . (A3. 2) 

Using  equation  (A3. 2),  one  can  solve  for  the  sampling  fraction,  f = n/N, 
in  terms  of  P^  and  A.  The  solution  is 

f = 1 - (1  - (A3. 3) 

The  theoretical  model  adopted  here  has  been  used  previously  by  Schilling 
(1978)  in  the  development  of  a lot  sensitive  sampling  plan  for  compliance 
testing.  As  in  the  present  application.  Schilling's  work  deals  with  the 
problem  of  trying  to  detect  defective  items  in  an  isolated  lot  (or  single 
finite  population)  of  items.  Unfortunately,  the  tables  provided  in  that  work 
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are  too  specialized  to  be  useful  in  the  present  application  to  microfilm 
sampling.  In  particular,  Schilling's  main  table  applies  only  to  the  case  = 
0.90  and  the  smallest  value  of  f given  is  0.01. 

It  can  be  shown  (Schilling,  1978)  that  the  approximation  used  in  equation 
(A3. 2)  is  conservative  in  that  the  exact  value  of  , from  equation  (A3.1),  is 
actually  greater  than  or  equal  to  the  value  calculated  by  the  approximate 
formula,  (A3. 2).  This  means  that  the  simple  formula  (A3. 3)  for  the  sampling 
fraction,  and  Tables  3.1  and  5.2  which  were  calculated  from  it,  tend  to  yield 
recommended  sampling  fractions  that  are  slightly  larger  than  would  be  found  if 
an  exact  calculation  based  on  (A3.1)  were  performed. 


A3 . 2 Effect  of  Distributing  a Given  Number  of  Defective  Rolls  Across  Several 

Groups 

The  formulas  in  section  A3 . 1 assume  that  all  the  defective  microfilm 
rolls  belong  to  the  same  identifiable  group.  If  instead  the  A defective  rolls 
are  assumed  to  belong  to  several  logical  groups,  it  would  be  necessary  to  get 
at  least  one  representative  from  each  group  in  the  sample  in  order  to  be  able 
to  locate  all  the  defective  rolls  in  the  follow-up  phase  of  the  inspection 
program. 

Formally,  we  consider  the  case  in  which  the  A defective  roils  are 
distributed  equally  among  k groups,  each  of  size  A/k.  The  relevant 
probability  of  detection  is  the  probability  that  at  least  one  member  of  each 
group  is  drawn  in  the  random  sample.  Let 

= number  of  members  of  group  i in  the  sample 
( i = 1 , . . . , k) , and 

P^ (k)  = probability  that  > 1 for  all  i,  1 < i < k. 

Treating  the  as  approximately  independent  random  variables,  it  follows  from 
(A3. 2)  that 

Pd(k)  = [1  - (1  - . (A3.4) 

Formula  (A3. 4)  was  used  in  the  construction  of  Figure  2. 

As  was  the  case  for  a single  group,  equation  (A3. 4)  can  easily  be  solved 
for  the  sampling  fraction  in  terms  of  the  other  parameters.  The  solution  is 

f - 1 - [1  - Pd  . (A3. 5) 
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A3 ■ 3 Relation  to  National  and  International  Standards  on  Sampling 


The  author  is  aware  that  it  would  have  been  desirable  to  refer  to 
national  or  international  standards  for  rules  on  choosing  appropriate  sample 
sizes.  The  following  discussion  explains  why  this  was  not  possible. 

The  sampling  standards  that  are  most  relevant  to  the  microfilm  inspection 
problem  are  MIL-STD-105D  (U.S.  Department  of  Defense,  1963)  and  its 
international  adaptation,  ISO  2859  (ISO,  1974).  Both  are  discussed  in  detail 
in  Schilling  (1982)  . 

In  parallel  with  this  report,  the  tables  and  rules  given  in  the  standards 
are  based  on  the  hypergeometric  probability  model,  and  on  binomial  and  Poisson 
approximations  to  that  model  (see,  for  example  paragraph  11.1  of  ISO  2859.) 
Further  correspondences  in  notation  and  concepts  between  those  standards  and 
the  present  report  are  as  follows: 


This  Work 


ISO  2859  and 
MIL-STD-105D 


Pd 

A/N 

n 

N 


1— Pg , (Pg  = probability  of  acceptance) 
LQ,  Limiting  Quality 
n.  Sample  Size 
Lot  or  Batch  Size 


An  important  difference  between  this  microfilm  inspection  problem  and  the 
problems  addressed  by  MIL-STD-105D  and  ISO  2859  is  that  the  sampling  standards 
are  designed  to  control  the  proportion  of  defective  items  in  a long  series  of 
lots  or  batches.  In  contrast,  there  is  only  one  "lot"  of  microfilm  to  be 
inspected  in  the  problem  considered  here. 

These  standards  do  describe,  as  a secondary  application,  the  use  of  their 
tables  for  sampling  "isolated  lots,"  a situation  which  more  nearly  matches  the 
microfilm  inspection  problem.  Unfortunately,  the  values  of  P^  ( = 1 — P^  ) in 
the  standard  tables  are  only  available  for  a very  limited  selection  of  values 
of  n and  N.  Hence  the  usefulness  of  the  standards  is  similarly  limited  in  the 
present  context.  In  particular,  the  plots  and  tables  given  in  sections  3 and 
5 of  this  report  could  not  have  been  derived  from  the  sparse  values  given  in 
the  tables  in  the  standards. 
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A3. 4.  Comparison  With  Textbook  Formulas  for  Sample  Size 


Equations  (A3. 3)  and  (A3. 5)  are  significantly  different  from  the  formulas 
for  sample  size  usually  given  in  textbooks  on  sampling.  Those  sources  use  a 
different  criterion  based  on  specifying  the  desired  length  of  a confidence 
interval  for  the  estimated  proportion  of  defective  items  in  a population, 
rather  than  the  criterion  used  in  this  report  based  on  . Specifically,  the 
usual  formula  for  sample  size  (Cochran,  1977,  section  4.4)  is: 

(A3. 6) 

1 + (n„-l)/N 

t^  PQ/d^ 

population  size 

proportion  of  defective  items  in  the  population  ( = A/N  ) 
1-P 

1.96  for  95%  confidence  level  (for  example) 
desired  bound  on  the  error  of  estimation  (half-width  of 
a 95%  confidence  interval) 

For  a large  population  (N  large  relative  to  n)  the  value  of  n is  well- 
approximated  by  simply  calculating  n^ . The  formula  for  n^  is  widely  used  for 
planning  surveys  (e.g.  MSTC,  1981)  to  good  effect  when  the  goal  of  the  survey 
is  to  estimate  the  proportion  of  defective  items  in  the  population. 

The  effect  of  the  population  size  in  formula  (5.1)  is  significantly 
different  in  character,  as  well  as  detail,  in  comparison  to  the  recommended 
formula  (3.2).  Specifically,  the  textbook  formula  (5.1)  is  only  weakly 
dependent  on  population  size  whenever  N is  large  compared  to  n (Cochran,  1977, 
page  76).  In  contrast,  the  recommended  formula  (3.2)  shows  that  n should  be 
chosen  to  be  directly  proportional  to  N. 

In  summary,  the  reason  the  sample  size  formulas  in  this  report  are  so 
different  from  the  usual  formulas  is  that  a different  criterion  has  been  used, 
based  on  , to  determine  the  required  sample  size.  The  sample  size  formulas 
given  is  textbooks  are  derived  by  considering  how  large  a sample  is  needed  to 
estimate  the  proportion  of  defective  rolls  with  a 95%  confidence  interval  of  a 
given  length.  Our  concern  here  is  not  with  estimation,  but  rather  with 
detection  of  a group  of  defective  rolls  of  microfilm. 
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